This folder contains code and data necessary for replicating Figure 1 and the major results in the discussion surrounding that. There are three main things to run. One note is that the second step is particularly computationally intensive and will take several hours at a minimum.

1. "python figure_1_data_prepare.py"
This part of the code performs data sampling and saves subsets of the data to folders in ../data/tmp_files/new_topic_number_26/boosting_files

2. "python personal_level_lda_allfour_find_k.py"
This runs LDA models that are then saved in the same tmp folders with the previous generated sub-sampled data.

3. "python figure_1_get_avg_matrics_and_figure.py"
This computes average distances among Members of Congress, Mayors, and Governors. Results are output in the folder ../data/results/new_topic_number_26
The distances themselves are written to files named according to the convention "a_vs_b_topic_space_4th_no_timelimit.csv" where "a" and "b" could be "congress", "governor", or "mayor". The code then uses this to generate a figure with the same information as Figure 1 in the paper, named tweets_topic_distances_distribution_4th_no_timelimit_no_soccer_paper_revision.pdf

